AITopics | zero-shot performance

Equilibrium Policy Generalization: AReinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Neural Information Processing SystemsJun-16-2026, 22:32:41 GMT

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zeroshot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios.

machine learning, natural language, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom (0.29)
Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Neural Information Processing SystemsJun-12-2026

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.41)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

Efficient Equivariant Transfer Learning from Pretrained Models

Neural Information Processing SystemsApr-24-2026, 18:48:58 GMT

Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimizationbased methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose λ-equitune that averages the features using importance weights, λs. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that λ-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance.

Add feedback

Compressing Large Language Models using Low Rank and Low Precision Decomposition

Neural Information Processing SystemsMar-21-2026, 21:18:17 GMT

This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance.

large language model, mathbf, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Appendix A.1 UniBench Implementation Details We have developed UniBench

Neural Information Processing SystemsFeb-16-2026, 19:08:51 GMT

To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. Following Matsuura et al. [2023] methodology, we evaluated Llava 1.5 [Liu et al., 2023] - a LLM-style VLM - on various benchmark types in UniBench (Table 2). Scaling improves many benchmarks, but offers little benefit for reasoning and relation. Figure 8: Benchmark capabilities performance does not scale with dataset and model size Median zero-shot performance of models on various benchmark capabilities.

benchmark, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Neural Information Processing SystemsFeb-16-2026, 12:58:11 GMT

Honguk Woo is the corresponding author.

domain factor, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
(2 more...)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsFeb-16-2026, 00:22:13 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See A.2 (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) For a detailed description and intended uses, please refer to 1. A.2 Dataset Accessibility We plan to host and maintain this dataset on HuggingFace. A.4 Dataset Examples Example question-answer pairs are provided in Tables 9 10 11, . Example Question "What does the symbol mean in Equation 1?" Answer "The symbol in Equation 1 represents "follows this distribution". "Can you provide more information about what is meant by'generative process in "The generative process refers to Eq. (2), which is a conceptual equation representing Question "How does the DeepMoD method differ from what is written in/after Eq 3?" Answer "We add noise only to Question "How to do the adaptive attack based on Eq.(16)? "By Maximizing the loss in Eq (16) using an iterative method such as PGD on the end-to-end model we attempt to maximize the loss to cause misclassification while Question "How does the proposed method handle the imputed reward?" "The proposed method uses the imputed reward in the second part of Equation 1, "Table 2 is used to provide a comparison of the computational complexity of the "Optimal number of clusters affected by the number of classes or similarity between "The authors have addressed this concern by including a new experiment in Table 4 of Question "Can you clarify the values represented in Table 1?" Answer "The values in Table 1 represent the number of evasions, which shows the attack "The experiments in table 1 do not seem to favor the proposed method much; softmax Can the authors explain why this might be the case?" Answer "The proposed method reduces to empirical risk minimization with a proper loss, and However, the authors hope that addressing concerns about the method's theoretical Question "Does the first row of Table 2 correspond to the offline method?"

artificial intelligence, machine learning, social media, (14 more...)

Neural Information Processing Systems

Country: